midwestern simulation presents: essense

we've trained an ai model that compresses sequences of token embeddings into shorter sequences of token embeddings, which it then attempts to reconstruct the original text from—with varying degrees of success.

the main dial we can turn is the number of "embedding tokens" used to represent a text. here's how it works:

in the encoder¹, these special tokens are appended to the original text.
the hidden states from these token positions are extracted.
they are then placed at the start of the decoder's context window.
the decoder then attempts to reconstruct the original text from this compressed concept.

the ultimate purpose of this architecture is to serve as a component in a system for farming compute from oblivious human minds over the internet, not text compression². but while we cook up the other parts, this is pretty fun to play with.

here's a text generated by midori³, encoded and decoded with various counts of embedding tokens:

here's a few more examples of texts and their compressed-then-decompressed versions

you can also embed a text, add that text's embedding to another text's embedding, and attempt a decoding, which can result in some wonderfully weird outputs⁴

¹a decoder transformer, ironically
²if lossless compression using lms is your goal, you'd be better off with entropy coding. this is something else.
³a creative model trained by dove
⁴they might be less weird, or more coherent, if you trained a VAE on embeddings from this model (which we plan to do), encoded embeddings w/ the vae before arithmetic and then decode (vae) and decode (llm), putting the embeddings before the decoder onto the realistic embeddings manifold